The qgam R package offers methods for fitting additive quantile regression models based on splines, using the methods described in Fasiolo et al., 2017. It is useful to use qgam together with mgcViz, which extends the basic visualizations provided by mgcv.
The main fitting functions are:
qgam() fits an additive quantile regression model to a single quantile. Very similar to mgcv::gam(). It returns an object of class qgam.mqgam() fits the same additive quantile regression model to several quantiles. It is more efficient that calling qgam() several times, especially in terms of memory usage.qgamV() and mqgamV() are two convenient wrappers that fit quantile GAMs and return gamViz objects, for which mgcViz provides lots of visualizations.Let us consider the motorcycle data set:
library(MASS)
data(mcycle)
plot(mcycle)
A good Gaussian GAMLSS model for this data is
library(mgcViz)
fitG <- gamV(list(accel ~ s(times, k=20, bs="ad"),
~ s(times)),
data=mcycle,
family=gaulss())
print(plot(fitG), pages = 1)
We can obtain the quantiles by inverting the estimated Gaussian CDF
qu <- c(0.1, 0.25, 0.5, 0.75, 0.9)
q_hat <- lapply(qu,
function(.q)
qnorm(.q, mean = fitG$fitted.values[ , 1],
sd = 1 / fitG$fitted.values[ , 2]))
and we can then plot them on top of the data
plot(mcycle, ylim = c(-150, 75))
for(ii in 1:5) lines(mcycle$times, q_hat[[ii]], col = 2)
Now we fit a quantile GAM for quantile 0.9
fit09 <- qgam(accel ~ s(times, k=20, bs="ad"),
data = mcycle,
qu = 0.9)
## Estimating learning rate. Each dot corresponds to a loss evaluation.
## qu = 0.9....................done
We now plot the quantile with confidence intervals
pred <- predict(fit09, se = TRUE)
plot(mcycle, ylim = c(-140, 85))
lines(mcycle$times, pred$fit)
lines(mcycle$times, pred$fit + 2 * pred$se, col = 2)
lines(mcycle$times, pred$fit - 2 * pred$se, col = 2)
There are two problems with this fit:
times < 15.We address these issues by modelling the learning rate as well
fit09_2 <- qgam(list(accel ~ s(times, k=20, bs="ad"),
~ s(times)),
data = mcycle,
qu = 0.9)
## Estimating learning rate. Each dot corresponds to a loss evaluation.
## qu = 0.9..........done
Again predict and plot
pred <- predict(fit09_2, se = TRUE)
plot(mcycle, ylim = c(-140, 90))
lines(mcycle$times, pred$fit)
lines(mcycle$times, pred$fit + 2 * pred$se, col = 2)
lines(mcycle$times, pred$fit - 2 * pred$se, col = 2)
Looks much better! Recall that
class(fit09_2)
## [1] "qgam" "gam" "glm" "lm"
hence we can do, for instance
summary(fit09_2)
##
## Family: elf
## Link function: identity
##
## Formula:
## accel ~ s(times, k = 20, bs = "ad")
##
## Parametric coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.182 1.755 1.244 0.214
##
## Approximate significance of smooth terms:
## edf Ref.df Chi.sq p-value
## s(times) 11.05 12.66 393 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.676 Deviance explained = 97.3%
## -REML = 578.21 Scale est. = 1 n = 133
We can use mgcViz plots by converting as usual
fit09_2 <- getViz(fit09_2)
or using shortcut
fit09_2 <- qgam(list(accel ~ s(times, k=20, bs="ad"),
~ s(times)),
data = mcycle,
qu = 0.9)
We fit multiple quantiles using the mqgam function
fitMQ <- mqgam(list(accel ~ s(times, k=20, bs="ad"),
~ s(times)),
data = mcycle,
qu = c(0.1, 0.25, 0.5, 0.75, 0.9))
## Estimating learning rate. Each dot corresponds to a loss evaluation.
## qu = 0.5..............done
## qu = 0.25..........................done
## qu = 0.75.........done
## qu = 0.1..............done
## qu = 0.9..........done
This should be manipulated using the qdo function
qdo(fitMQ, 0.1, summary)
##
## Family: elf
## Link function: identity
##
## Formula:
## accel ~ s(times, k = 20, bs = "ad")
##
## Parametric coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -54.535 2.367 -23.04 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df Chi.sq p-value
## s(times) 8.323 9.153 932.5 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.649 Deviance explained = 89.9%
## -REML = 593.53 Scale est. = 1 n = 133
To plot one of the fitted quantile models we do
qdo(fitMQ, 0.1, plot)
To plot effect of several quantile at once we use
mgcViz
fitMQ_viz <- getViz(fitMQ)
plot(fitMQ_viz)
We can plot the fitted quantile on the data
plot(mcycle, ylim = c(-150, 90))
for(ii in 1:5) lines(mcycle$times, predict(fitMQ_viz[[ii]]), col = 2)
NB output of getViz is simply a list of gam models, we don’t need to use qdo. But there is a memory price to pay (small for this small data set)
object.size(fitMQ) / 1e6
## 1 bytes
object.size(fitMQ_viz) / 1e6
## 1.2 bytes
To recap we can do
fit <- qgam(...) and then fit <- getViz(fit), or fit <- qgamV(...) directlyfit <- mqgam(...) and we use qdo(fit, 0.1, plot)fit <- mqgam(...), then fit <- getViz(fit) and finally plot(fit[[1]])fit <- mqgamV(...) andHere we are going to model weekly rainfall (mm/week) from over 600 weather station in the Parana state of Brazil. The data comes comes from the INLA R package. The meaning of most variables should be evident.
library(testGam)
data("parana")
plot(parana[parana$TIME==1, ]$LO, parana[parana$TIME==1, ]$LA, xlab = "LO", ylab = "LA")
We fit a quantile GAM for \(\tau = 0.8\) with an isotropic effect for longitude and latitude, a cyclic effect for the time of the year and smooth effects for distance from the sea and elevation:
library(mgcViz)
fit <- qgamV(PREC ~ s(LO, LA, k = 25) + s(seaDist) + s(TIME, bs = "cc") + s(EL),
data = parana,
qu = 0.8)
## Estimating learning rate. Each dot corresponds to a loss evaluation.
## qu = 0.8............done
print(plot(fit), pages = 1)
Hence we can do model checking in mgcViz. However standard tools are inadequate
qq(fit, CI = "normal")
But we can verify number of observations falling below the fit
mean( (parana$PREC - fit$fitted.values) < 0 )
## [1] 0.803384
and mgcViz provides a specific layer for doing this along one covariate
check1D(fit, x = "EL") + l_gridQCheck1D()
or two covariates
check2D(fit, x1 = "LO", x2 = "LA") + l_gridQCheck2D()
We can of course fit this model to several quantiles:
fitM <- mqgamV(PREC ~ s(LO, LA, k = 25) + s(seaDist) + s(TIME, bs = "cc") + s(EL),
data = parana,
qu = seq(0.1, 0.9, length.out = 5))
## Estimating learning rate. Each dot corresponds to a loss evaluation.
## qu = 0.5................done
## qu = 0.3...................done
## qu = 0.7...............done
## qu = 0.1......................done
## qu = 0.9............done
plot(fitM, select = 1)
print(plot(fitM, select = 2:4), pages = 1)
Notice that the spatial effect seems much stronger for high quantiles than for the low ones. The same is true for distance from the sea and seasonality (
TIME), while the effect of elevation is not significant from qu = 0.9. We can visualize the spatial effect in 3D as follows:
plotRGL(sm(fitM[[5]], 1))
We might also wonder whether the spatial effect changes with time. To verify this here we construct a tensor product smooth between the 2D thin-plate-spline spatial effect and the cyclical effect of time. We simplify the model by removing the effect of seaDist.
fit4 <- qgamV(PREC ~ te(LO, LA, TIME, d = c(2, 1), k = c(20, 10), bs = c("tp", "cc")) + s(EL),
data = parana,
qu = 0.9)
## Estimating learning rate. Each dot corresponds to a loss evaluation.
## qu = 0.9........done
plotSlice(sm(fit4, 1),
fix = list("TIME" = round(seq(1, 53, length.out = 6)))) + l_fitRaster()
You can plot any slice in 3D by doing:
plotRGL(sm(fit4, 1), fix = c("TIME" = 11))